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We report on optimum direct detection of digital data signals that are 
transmitted over optical fibers. Direct detection is provided by a photo- 
detector whose output current is modeled as a noisy filtered Poisson stream 
of pulses. In this model, the time-varying pulse arrival rate is proportional 
to a linearly distorted version of the modulating signal. We show how the 
photodetector output is processed to derive the minimum probability-of- 
error receiver. Special attention is given to certain practical limiting cases. 

When the average energy in the response of the photodetector to an indi- 
vidual photon is small compared to the additive thermal noise, the optimum 
detector is shown to be linear except for the use of precomputed bias terms. 
At the other extreme are the photomultiplier and the avalanche photodiode 
where the average energy in the response of the photodetector to a single 
photon is large compared with the additive noise. In this situation, we show 
that the optimum detector estimates the photon arrival times and then uses 
these estimates in a weighted counter. In both limiting cases, the detectors 
are specialized to one-shot M-ary and synchronous multilevel pulse- 
amplitude modulated (pam) signals with inter symbol interference. For 
PAM signaling, we demonstrate that finite system memory allows applica- 
tion of dynamic programming to provide a detector implementation whose 
computational complexity does not increase with time. 

I. INTRODUCTION 

In recent years much attention has been focused on communication 
over optical channels. 12 Most early work was concerned with the 
physics of the electromagnetic transmission phenomena associated 
with various optical media and with the devices needed to change 
electrical signals to optical ones, and vice versa. In this paper, we are 
concerned with the optimum (maximum likelihood) reception of digital 
data transmitted over the fiber-optic channel. Our work was motivated 
by the many invaluable discussions we have had with S. D. Personick 
on this subject. 
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We shall not dwell on the quantum mechanical limitations imposed 
on the measurements of signals in the optical frequency range. Instead, 
we adopt a practical approach and assume at the outset that direct 
detection is used to convert optical energy to an electrical signal. This 
is accomplished by using a photodetector prior to any signal processing. 
Thus, we study a classical optical reception problem with the under- 
standing that the photodetector output can be examined in every 
detail so as to extract all relevant information. 

In a fiber-optic communication system, information is conveyed by 
modulating the intensity of a light source, such as a light-emitting 
diode. This is manifested in a photon stream whose arrival times form 
a Poisson process with a time-varying intensity function. The photo- 
detector output current can then be modeled as a noisy filtered 
Poisson process whose intensity function is the sum of a dispersed 
version of the modulating wave and a background dark current. Thus, 
the central problem in communication systems employing a fiber- 
optic medium is the detection of the intensity function. Bar-David 3 
and Gagliardi and Karp 4 have considered the optimal reception prob- 
lem in the absence of dispersion (intersymbol interference) and addi- 
tive thermal noise, while Personick 5-7 and Messerschmitt 8 - 9 have con- 
sidered linear suboptimum receivers to combat these deleterious effects. 

Section II reviews the communication theoretic model of the fiber- 
optic channel. Section III presents two simple examples that are 
intended to focus on certain system essentials and to illustrate some 
fundamental ideas involved in subsequent work. Section IV develops 
a general representation for the likelihood functional. Sections V and 
VI consider reception when the energy in the response of the photo- 
detector to an individual photon is much smaller than the thermal 
noise, while Sections VII and VIII consider the complementary situa- 
tion of large average energy per pulse-to-thermal noise. 



II. A REVIEW OF THE MATHEMATICAL MODEL 

In the past few years, a pragmatic communication theoretic model 
for data transmission over the fiber-optic channel has evolved. The 
papers by Personick 5-710 contain an up-to-date account of this model 
as well as provide more complete references on the physical aspects of 
fiber-optic communication. For the purpose of this investigation, it 
will sufiice to think of the optical modulation process as providing a 
proportionate variation in the rate of photon arrivals at the photo- 
detector. This device, of which there are several types, is a transducer 
that converts optical to electrical signals. The photodetector output 
current is illustrated in Fig. 1, and can be described as the sum of a 
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Fig. 1 — Photodetection. 
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(i) 



and white gaussian noise, n(t), with spectral density N - The photon 
arrival times h, t 2 , • • • are a family of independent, identically dis- 
tributed, random variables, as are the positive gains gi, g-i, • • • . More- 
over, these two families of random variables are independent of each 
other. The pulse w(t) is square-integrable and is the convolution of 
two pulse shapes. The first pulse is the response of the photodetector 
circuitry to the generation of a single charge-carrier (i.e., an electron 
or a hole), while the second pulse is included for mathematical ex- 
pediency so as to whiten the noise at the photodetector output.* We 
distinguish between two types of photodetectors, those that provide 
avalanche gain and those that do not. In the latter category is the 
photodiode that operates with g, t — 1, i = 1, ■■•, v and results in a 
pulse energy-to-noise ratio Siv 2 (t)dt/N , which is typically —20 dB. 
In other words, the response of the photodetector to an individual 
photon is masked by the additive background noise. This is in contrast 
to the photomultiplier and the avalanche photodiode where the gains 
possess a (discrete) probability distribution whose mean, g, can be 
rather large and whose variance is a power (^1) of the mean. 11 For 
these devices, the average pulse energy-to-noise ratio g 2 fw 2 {t)dt/N 
can be on the order of 20 dB. 

The stochastic process v(t), which is the number of pulses generated 
at the photodetector output in the interval (0, t), is a Poisson process 
with intensity \(t), and therefore 

PrO(0 = iV] = exp{-A(*)} [A(0j ' 



JV! 



(2) 



t Note that the inclusion of a reversible operation, such as a whitening filter, does 
not affect the performance of an optimum detector. 
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where 

A(0 = C \(t')dt'. (3) 

Jo 

Moreover, each photon arrival time t k possesses the probability density 

P(tk) = f\(t')dt" (4) 

where the integral is over the observation time.* 

In the digital fiber-optic communication system under discussion 
here, the positive intensity function \(t) is the information-bearing 
signal and is the average rate of electrons produced by the photodetec- 
tor. The manner in which X (t) is manifest in the received optical signal 
(the photodetector input) is through the relation 

\(t) = k(P(t) + X„, (5) 

where <P(t) is the received optical power, A; is a constant conversion 
factor, and X is the average dark, or ambient, current in "counts" 
per second. 1 Thus, information is transmitted by modulating the 
optical power and must be recovered by processing the noisy photo- 
detector output, I(t) + n(t). As a result of transmitting the optical 
signal through the fiber-guide medium, the intensity function at the 
photodetector output will be the sum of a linearly distorted version of 
the transmitter intensity and the dark current. In the sequel, X(0 will 
be understood to mean the intensity function at the receiver. 

Statistical averages of I(t) are found by elementary calculations. 
For example, 



and 



#[/(*)] = E(g) f° X(r)w(t - r)dr (6) 

J — 00 

<7?(« = E(g*) f \(r)w*(t - r)dr, (7) 

J — 00 



where E(g) and E(g 2 ) are the average and average square of the 
avalanche gain g. Higher moments can also be readily evaluated. 

A linear channel model with additive "noise" is suggested by (6) 
and (7). In such a model, the desired signal is taken to be the average 
value of I(t), namely X(0 passed through a filter with impulse response 
E[g~]w{t). One component of the added noise can be thought of as the 
signal dependent process I(t) - E[I{t)1, which has mean zero and 



f Note that the arrival times are not assumed to be ordered. 

* In free-space optical communication systems, X(i) must be regarded as having a 
noisy component. 
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variance given by (7). In addition to this noise, the gaussian noise 
must also be included before processing. While this linear model is 
a convenient approximation in some situations, 5-10 for purposes of this 
investigation we work with the process /(/) directly. 

Now that all the physical parameters have been defined, the optimum 
detection problem can be stated as follows : 

Given that the intensity function can assume one of M equi- 
probable positive functions \ n (t), % t ^ T, m = 1, • • •, M, the 
task of the detector is to decide which one of the M intensities has 
been transmitted after processing I(t) plus gaussian noise for T 
seconds. Of particular interest is the synchronous pulse-amplitude 
modulated (pam) signal 



Ht) = Za k f(t -kT) -f-X 



k 



0, 



where each data bit, a k , assumes the value or 1, 1/T is the data 
rate in bits/s, and f(t) is a positive time-dispersed pulse. 

The subject of our investigation is summarized by the question: 
How should the photodetector output, I(t) -f- n(t), be processed so 
as to minimize the probability of error? 

III. A MOTIVATING SIMPLIFIED DISCRETE MODEL— TWO EXAMPLES 

To preview, in an elementary way, some ideas that are more fully 
developed in the sequel and also to serve as a motivation to the reader, 
we present a simplified version of the model discussed in the last 
section. 

In a simplified theoretical model, the time index t is assumed to take 
on the discrete set of values t h t 2 , ■ ■ ■, tj, where tj = jA. Thus, instead 
of writing 

I if) = £ g k w(t - h) 

for the photodetector response to a photon stream, we write 

I(tf) = Z gkqkwfa - h) j = 1, 2, • • -, J. (8) 

In the above expression, (g fc }-{ can be regarded as an independent 
Bernoulli sequence with probabilities 1 

Pr [q k = 1} = X k and Pr {q k = 0} = 1 - \ k , 



f For convenience, we have taken A = l, and so we have written \ k and 1-A* 
instead of \ k A and 1 — \ k A. 
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where we have in mind that < \ k « 1. Thus, g k = 1 (or 0) represents 
the arrival (or nonarrival) of a photon at time t k . We make the further 
simplifying assumption that w(tj — t k ) = A8 jk (A a positive constant), 
where {/* is the Kronecker delta and is nonzero only when j = k. 
This corresponds to assuming that the pulses w(t) and w{t — A) do 
not overlap. Within this simplified framework, the received time- 
discrete signal is of the form 

I(ti)=gjQiA, i= 1,2, ••-,/. (9) 

We recall that { Xjf } y=i is the intensity function associated with the wth 
hypothesis. The particular intensity which is active is, of course, 
unknown at the receiver beyond the knowledge of the finite set from 
which it was chosen. The last ingredient of our model is to include the 
fact that the observation I(tj) is noisy and is given by 

y(tj) = gjqjA + n h (10) 

where the noise samples are assumed to be gaussian, independent, and 
zero-mean and have variance No. In relation to the more accurate 
model of the previous section, a can be thought of as the standard 
deviation corresponding to f^U n(t)dt. As is well known, the optimum 
detector computes the likelihood (the a posteriori probability density 
of the received signal conditioned on each hypothesis— in this case, 
the intensity) and selects the maximum. In statistical parlance, this 
is a standard multihypothesis testing problem. We now develop the 
form of the likelihood for two different assumptions on the nature of 
generation of secondary electrons : 

(i) No avalanche gain (g, = 1). 

[ii) Discrete avalanche gain (gj takes on values 1, 2, • • •, G, with 
probabilities pi, p-i, •••, pa)> 

In each case, we first obtain the likelihood for one observation. Owing 
to the nonoverlapping assumption on the pulses and the independent 
noise samples, the likelihood for J independent observations is given 
as a product. Our goal is to obtain a simple representation for the 
effective f likelihood L im) (\ h X 2 , • • -, \j]yi, 2/2, ■ • •, Vj), where the 
superscript m denotes which intensity is assumed active. Given the 
received samples yi,---,yj, the maximum likelihood (optimum) 
receiver selects the index m* that maximizes L (m) and declares that 
intensity X (m,) is present. We shall find that, if No is small, then the 



f "Effective" refers to the fact that constants common to all hypotheses are 
dropped. 
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likelihood assumes an especially simple form. Specifically, in the high 
signal-to-noise ratio case, the likelihood is of the form 

L< m > ~ n (X9*)*(l - If) 1 -*, (11) 

.7=1 

where q, = 1 if y, ^ yr (and zero otherwise). The quantity yr is a 
threshold value that we shall derive for each example. Alternatively, 
the log-likelihood is expressible as the weighted counter 

£ qj log \r + [(1 - qj) log (1 - X ( f )], (12)* 

J'-l 

where q, is an estimated photon arrival process. In the complementary 
case of small signal-to-noise ratio (iV"o— »«»), the detector is of the 
matched-filter or correlator type. The effective likelihood in this 
case is 

Lc»)~ c £ \f>y t - 6 (m) , (13) 

>-i 

where c is a constant and b (m) is a hypothesis-sensitive bias term. We 
now turn to the specific examples. 

(i) The Photodiode (No Avalanche Gain) 
The single observation y, is defined as 

yj — n jt with probability 1 — X 

and (14) 

yj = A + Wy, with probability X. 

We temporarily drop the subscripts dealing with time (j) and hy- 
pothesis (m) while investigating this single observation. The likelihood 
is the mixture probability density 



-&h- 



/>■//) = ^7r.\,.r>ex,, ■! ,,v !| I Xi ^\x|,||-| 



(15) 



Noticing the hypothesis (X) insensitivity of the first term, the effective 
likelihood becomes 

L (2/ )=(l-X) + Xexpl^-2^ 



(16) 
A simple calculation shows that the two terms in (16) are equal when 



* The reader familiar with Ref. 3 might expect an additional —A term in (12). 
Owing to the simplified Bernoulli model employed above, this is not the case. How- 
ever, the more refined analysis in the sequel will include this term. 
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y = y T , where 



A , No , / 1 - X \ 



(17) 



For small N , y T « A/2 and the graph of L(y) converges to the solid 
line shown in Fig. 2. So, for No/ A* |log X | small, the effective likelihood 
can be approximated as 



Uy) = 



X exp 



1 - X 
Ay A 2 

No 2N 



y ^ vt 
y > yr- 



(18) 



The sense of the approximation is expressed by the following easily 
proven statement : For each 8 > 0, one can find an N > so that 



Pr 



L(v) 

Uy) 



- l 



> 5 



= 0. 



(19) 



To simplify the likelihood, note that exp { {Ay /No) - (A 2 /2N ) } and 
y T are hypothesis-insensitive and can be deleted from the effective 
likelihood, and since we are assuming that X is extremely small, 1 — X 
can be treated as 1. The effective likelihood is then simply 



£(m) = l\f>Ji, 



(20) 



where g, = 1 if yj > yr and zero otherwise. Because of the indepen- 
dence of the noise samples and the nonoverlapping property of the 



L(y) 



1-X 



■± 



Fig. 2— Convergence of graph of L(y) to the asymptotic form (JVo -> 0). 
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Fig. 3 — Threshold-based weighted counter. 



pulses, the likelihood for J observations is the product 

i-i 
which yields the weighted counter 



j 

z 

/-I 



logZ<»> = £ ffylogXf 1 



(21) 



(22) 



shown in Fig. 3. The receiver selects the index that maximizes (22) and 
declares that the corresponding intensity was transmitted. 

In the complementary case of low signal-to-noise ratio (N — *<*>), 
we expand the likelihood function in a Taylor series and retain the 
dominating terms. This step must be done with care, since the nu- 
merator of the exponent has variance JV , while iVo also appears in the 
denominator. By normalizing the exponent, it is seen that the variance 
of the exponent is proportional to 1/N Q ; thus, the exponent will be 
small and a series expansion is useful. Keeping the first two terms in 
such an expansion of (16) gives 



*w-» -*(#-&)• 



(23) 



and the likelihood for J observations becomes t the digital correlator 



f We have used the fact that \j/NJtAyj—(A*/2)] <K 1, and with t, « 1 that 
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(matched-filter) 

which is shown in Fig. 4. 

(ii) The Photomultiplier or Avalanche Photodiode {Discrete Avalanche) 

Again, we start with the single observation case but now, because 
of the avalanche mechanism, a single primary gives rise to 1 or 2 or 
•••, G secondaries with probabilities pi, p 2 , ■■-, pg, respectively 
CH a iPi — !)• So the measurement y is modified as 



V = 



n, with probability 1 — X 
A + n, with probability Xpi 



(25) 



[GA + n, with probability Xp . 
The likelihood is the mixture density 



, ^ (1 - M 

p(y)= h B ex P 



f 



G 

+ E 



Xpi 



2JVo j i = i V^riVo 



exp 



(y - W 

2N 



(26) 



Factoring out hypothesis-insensitive terms, the effective likelihood 
becomes 



L{y) = (1 - X) +X E Pi exp 



i = i 



I Ay _ PA* 

N 2N 



(27) 



As N -+0, we notice that L(y) ~ (1 - X) for y < A/2. When 
y > A/2, let IA denote the number A, 2A, ■ ■ -, or GA that is closest 




Fig. 4 — Elementary version of digital correlator. 
1398 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1975 



to y. Then the series appearing in (27) will be dominated by one term, 
and the likelihood becomes 



L{y) ~ pi X exp 



I Ay ~l 2 A* 

-jrr — oftT ( i as N o -» 0. 

A o --> o 



Proceeding as in the previous example, we consider both No and X 
small and drop hypothesis-insensitive terms from the approximate 
likelihood to obtain 

Uy) = [A]*, (28) 

where q = 1 when y ^ A/2 and zero otherwise. 1 Moreover, note that 
the threshold is the same as in the nonavalanche case. This is because 
the detector is only interested in ascertaining whether or not a photon 
has arrived and need not estimate the magnitude of the avalanche gain. 
Again, for J measurements, the corresponding log-likelihood expression 
is simply the weighted counter 



j 



log L = Z Qj log Xj. (29) 



.;=! 



As N —><x>, we again expand the likelihood (27) in a Taylor series to 
obtain 



'«-i-*(i-£«[ 1+ »-K]l 

which, for J measurements, becomes 

>=i i=i I 



No 2N 



(30) 



(31) 



The above is again interpreted as a correlator where X ( /° is correlated 
with Ay j/N -Z?=ilpi = (Ay } /N )EZgJ 

IV. THE MAXIMUM LIKELIHOOD DETECTOR 

Here, we begin to answer the question posed at the end of Section 
II by presenting a derivation of the likelihood function associated 
with the received signal. The likelihood function is the probability 
measure of the photodetector output, given that a particular intensity 
is active. It is well known 12 that, when one of M equally likely signals 
\ m (t) is transmitted, the optimum (minimum probability error) 
detector computes the M values of the likelihood function evaluated 
at the received waveform and declares that the jth. signal was sent, 
where the jth likelihood function is the largest. 

r As expected when iV -* 0, the avalanche gain provides no essential benefit. A 
more interesting asymptotic evaluation and one that is more akin to reality is obtained 
by parameterizing the gain distribution such that -£[#]/ A r ->*> . 
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We denote the received signal by 

y(t) = 7.(0 + n(0, O^tgr, (32) 

where I m (t) is the information-carrying, filtered, Poisson process 



wit) 



I m (t) = L g k w(t ~ t k ), (33) 



fc— 1 



and where the index m [corresponding to X m (0] i s hidden in the 
statistics of (fe) and v(J). These statistics are described by (2) to (4) 
with X(0 replaced by X m (t). 

The task of the optimum receiver is thus to process the photodetector 
output y(t) for T seconds and then decide which intensity function 
\ m (t), m = 1, 2, ■ • •, M is in effect. As we have mentioned earlier, 
the random variables [g k \ represent the avalanche gains, and the pulse 
shape w{t) is so far arbitrary with the only requirement being finite 
energy. Although in actual practice the noise at the output of the 
photodetector is not white, it can be whitened by a filter before addi- 
tional processing and the effect of this filter will be manifest in the 
shape of w(t). 

The conditional likelihood function [when I m (t) is fixed] has the 

standard form 13 



LJ>|/J = exp {jfj'lMvWt ~ 2F o //iM)* 



(34) 



The desired likelihood is the expectation of (34) with respect to I m (t) 
for fixed m, i.e., 

L m (y) = Er{L m ty\I m -]}. (35) 

Once the intensity X m (0 is specified, the above expectation is taken 
with respect to the number of arrivals, the arrival times, and the 
avalanche gain values. The detailed evaluation of this expectation and 
the interpretation of the resulting structures, in terms of implementable 
physical operations on y(t), is our objective. The exact structure is 
sufficiently complex that many judicious approximations will have to 
be made to glean the essential nature of the operations. 

We remark that a representation of (35) in terms of an estimator- 
correlator structure has recently been treated in the literature. 12 - 14 ~ 16 
The optimum detector has been shown to be a correlation detector 
and the deterministic signal in the classical correlator is replaced by its 
least-squares estimate. This is a reformulation of the detection problem 
in terms of an estimation problem. Proponents of this method have 
taken the viewpoint that various suboptimum detectors are suggested 
by this formulation. A typical approach might be to replace the least- 
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squares estimate by the linear least-squares estimate or some other 
approximation, and to approximate the resulting stochastic integral 
by conventional integrals. While this might be reasonable, it does not 
indicate the direction of the approximation. We prefer an approach 
that, to be sure, has many approximations and makes use of estimates 
in place of the true quantities, but that can be explicitly related to the 
optimum detector under the asymptotic conditions of large and small 
signal-to-noise ratio. 

Toward this end, we proceed by writing (35) in more detail. Neglect- 
ing edge effects on the integrals and assuming that the observation 
time t is much larger than the effective duration of a single pulse w(t), 
we can express the inner product and the square term indicated in 
(34) as 

* I m (t)y(t)dt = t g k P(h), (36) 



/: 



where 



P(h) = f w{t- t k )y(t)dt. 
Jo 



The square term is written as 

•r 



L 



il(t)dt= £ g*giR(tk-tj), (37) 

Ar,j = l 

where R(t) = J^wi^wit — r)dr is denned as the pulse correlation 
function. 

Substituting (36) and (37) into (35), we obtain 



(38) 



[ 1 1 " 

L m (y) = Ej exp \ w £ g k P{h) - 7 r W £ g k gjR(t k - t 5 ) 

Employing the vector notation g„ = (g h g 2 , • • • , g„) and t„ = 
(h, ti, • • • , t y ) gives the expression 

L m (y) = £ t „ g ,., [exp jf £i 9kP{tk) ~ W k t^i 9k9}R(tk ~ ' y) M ' 

(39) 

and after performing the indicated expectations we obtain a detailed 
representation of the likelihood function 

L m (y) = exp [-A m (r)] f 1 £ f dt n U \ m (t k ) f[ P (g k ) 

X exp I i- t g k P(t k ) -~L±± g kgj R(t k - tj)\ , (40) 
where p(g { ) is the (discrete) probability density function of the ava- 
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lanche gains and where it is understood that, when n = 0, the summand 
is taken to be unity. 

To more easily interpret and/or mechanize the likelihood calcula- 
tions, it will be convenient in some applications to assume that the 
photon arrivals can only occur at discrete instants of time {jA}, where 
A is some fixed (small) interval and j = 0, 1, 2, • • •, J. The integer J 
will be denned as the closest integer to T/A. This assumption is easily 
accommodated in (40) by replacing fdt n with a multidimensional 
sum Et„ over the lattice {t k = jA : k = 1,2, ■• -,n;j = 0, 1, • • • , J] , 
and by replacing X(t k - jA) with the probability that + jA — A/2 
^ tk ^ j A + A/2. The likelihood function under this set of assump- 
tions then becomes 

L m (y) = exp [-X m (r)] £ ± £ E ft Hh) ft p(9i) 

n =0 n\ t„=0 gn •— 1 * _1 

X exp I 4- [ t g k P(h) - h £ n £ gtg*R(tk - ull , (41) 

[ iVo L* = l fc,m = l J ) 

which will be referred to as the (time) discrete likelihood function. 

The two infinite functional series, (40) and (41), are not of much 
use as they stand. However, under a variety of physically realistic 
situations and by making suitable physical approximations as well 
as asymptotic expansions, we shall be able to deduce from these repre- 
sentations real-time implementable signal-processing algorithms. 

By suitably normalizing the likelihood functions, (40) and (41), 
1/JVo can be replaced by the (pulse) signal-to-noise ratio. This param- 
eter a 2 will play a central role in our subsequent treatment, and its 
relative size will dictate our particular approach. The normalization 
entails replacing R(t) by R(t)/R(0), P(t k ) by P(t k )/R(0)g, and the 
random variables g k by g k /g, where g = Eg k ; consequently, 

, g 2 R(Q) 

a = xr 

and may be viewed as an average pulse signal-to-noise ratio. As we 
have discussed in the preceding section, in some applications this 
parameter is small, while in others it is large. Thus, our investigations 
in the sequel will focus on these two ranges. Additionally, different 
treatments of the likelihood ratio are also required, depending upon 
the presence or absence of avalanche gain. 

It is instructive to give a still different representation for the like- 
lihood, which will be found useful in the sequel. Towards this end, we 
introduce a zero-mean, stationary gaussian process x(t) with correlation 



+ This probability is given by 

■,4+A/2 

\(t)dt «X(jA)-A. 



£ 
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function, 

E[x(t)x{t + r)] = R(t), 

and can then write (39) in the form 

L m {y) = #,„ g ,., j^exp L 2 E^Pf**}} E x exp -tat, g&(tk)\], 

(42) 
where we have used the elementary identity for gaussian processes 



exp ■ 



-«/2 E gtgjRfa - ti)\ = E x exp 



ia E 9kx(t k ) 



k = i 



Since, over the observation interval, (42) is absolutely integrable, 
the expectation with respect to x and the other random variables may 
be interchanged. By noting that 

#t„. Bn .,exp E 0*s(k) = exp (-A m ) 

• exp E J P (g) Xm (0 exp [iax (t) ]d* } , (43) 

we can write (42) in the form 

L m (y) = exp (-A m )E x exp fe f* p(g)\ m (t) 

X exp [o?giP(Jt) + toffieCOD*)} • (44) 

In particular, in the absence of avalanche gain, i.e., p(g) = S(g — 1) 
(44) assumes the compact form 

L m (y) m exp (-A m )E x exp (f \ m (t) exp [a 2 P(0 + te*(0]rft)J- 

(45) 

It may appear that the introduction of the process x(t) did not 
simplify matters, since the explicit evaluation of the expectations 
again leads to an infinite functional series without adding insight into 
the nature of the processor. We shall nevertheless find this representa- 
tion useful. As will be seen, when suitable approximations are made and 
asymptotic behaviors explored, a great deal of insight can be gained 
from the alternative representations for the likelihood 1 (40), (44), and 
(45), as well as the discrete likelihood (41). 

f By normalizing the exponent, i.e., introducing a 2 , we should actually use new 
symbols to denote g m /g and R/R(0). To avoid introducing extra notation, we retain 
the symbols g m and R(0), but we realize that, whenever a 2 is present, these variables 
have been normalized. 
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V. SMALL SIGNAL-TO-NOISE RATIO (« 2 -»0) 

Here we consider the physical situation corresponding to small s/n 
(a 2 -* 0). This occurs when a photodiode is used for direct detection. 
In this application, the response to an individual photon is masked by 
the background noise, and we do not expect the receiver to make 
explicit use of the information supplied by an individual pulse. Rather, 
the aggregate effect will be important. This is in contrast to the 
"counting" receivers (for large a 2 ), where individual counts contribute 
explicitly to the final decision. Since the avalanche gains are unity in 
this application, the likelihood function takes the form of (45). Two 
signaling situations of interest are examined next. 

5.1 M-ary signaling 

Since a 2 «; 1 (typically, a = — 20 dB), our approach will be to 
expand (45) in a power series in a 2 and retain the first two terms. 1 
Consider the following Taylor series approximation to the argument of 
the exponent in (45). Again dropping the index m, let 

t (a, x) = exp \[ \(t) exp [a 2 P(t) + iax(t)ldt 

~e A + r(0,x)a + £"(0,z)! 2 . (46) 
Evaluating the derivatives, the asymptotic likelihood function becomes 
L m (y) ~ E x \l + a j\ m (t)x(t)dt + | (J* 2\ m (t)P(t)dt 

- f T f T XmimmihMhMtJdhdh} ~ ~ £ \ m (t)x 2 (t)dt] ■ (47) 

Recalling that the exponent has been normalized such that Ex = 0, 
Ex 2 = 1, and Ex(h)x(t 2 ) = R(h - t 2 ), we get, after performing the 
averages, 

L m (y)~l \-a*\f*\ m (t)P(t)dt 

-\ f T f ' * m (ti)*m(t2)R(h - t 2 )dhdh - \ J q Xm(0^J (48) 



or 

-r 



lm(y) = \ogL m {y) ~J o X m (t)P(t)dt 

~\ f T f 7 "x»(*i)X»(* a )iWi " h)dhdt 2 - £A m . (49) 
A Jo Jo 

The detector involves linear operations on the filtered received signal 
P(t), addition of constants, and a maximization. As shown in Fig 5, 

f Of course, the same answer would be obtained by working with (40). 
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v(t) 




Mm =V2 J" / \ m |t,)X m (t 2 )R(t, -t 2 )dt,dt 2 

Fig. 5 — Correlator filter for M-ary signaling. 



+ A m 

(m = 1,2,-, M) 



a realization of the receiver is obtained by first passing the incoming 
signal, y(t), through a filter with impulse response w( — t)/R(0) to pro- 
duce P(t). This signal is then passed through a bank of M filters with 
impulse responses \ m (X — t), m = 1,2, •••,Jkf and sampled at 
t = T. This is the first term in (49). The other two terms are precom- 
putable biases. The detector then chooses the index m*, which achieves 
the max L m (y), and the corresponding \ m *(t) is declared to be the 

m 

transmitted intensity. 

There is a pleasing interpretation of this receiver which is reminiscent 
of the "linear" model discussed in Section II. If one were to consider 
the detection problem when the signal I{t), given by (1), is replaced 
by its average E[I(t)~] = 7(0, given by (6), then the optimum detec- 
tor in gaussian noise would base its decision on the likelihood function 



£ = J%(t)I(t)dt - \ j\l{t)Jdt. 



(50) 



Substituting (6) into (50) gives 
•r r r 



£ 



= f dty{t) f w{t - t)\{r)dr 

Jo Jo 

- 2 / I / / w(t - h)\(ti)w(t - ti)\(ti)dtidt 2 \dt 



= J \(r)P(r)dr -\ ( f \(<,)X(fa)JJ(<, - t 2 )dt 1 dt s . 
Jo & Jo Jo 



(51) 
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Note that (51) differs from (49) only by the bias term A m , the prob- 
ability that no photons have arrived at the photodetector. We con- 
clude, therefore, that the optimum detector structure in the case of 
small a 2 is thus "matched" to the average signal. 

5.2 Optimum detection of PAM signals via the Viterbi algorithm 

We will now develop the optimum receiver structure (still for small 
a 2 ) when the intensity is a pulse-amplitude modulated (pam) signal 1 



k 

1 

n=0 



X(0 = £ <*„/(* - nT), Og^T", (52) 



where each a n can assume the binary values or 1, f(t) is a positive- 
valued pulse that incorporates the distortion of the optical medium, 
1/T is the symbol rate, and T > kT. Note that in writing (52) we 
have dropped the subscript m which we have used to identify the trans- 
mitted signal (intensity), since for pam signaling it is generally more 
convenient to think of the receiver as finding that sequence [a n \ which 
maximizes the likelihood. Substituting (52) in (49) and emphasizing 
that the likelihood function is now to be regarded as a function of a 
particular data sequence (which uniquely corresponds to a specific 
intensity) gives 

k I k 

L{a h a 2 , ■ ■ ■, a k ) = £ a n z n - = £ L a n a m K n - m , (53) 

n=l * n,m = 1 

where 

Zn= f T LP(t) - nf(t - nT)dt (54) 

is the response at time nT of a filter matched to f(t) when the input 
is P(t) - \, and the correlation-type function 3C is defined by 

3C„_ m = J*dr (J*dtf(t - nT)iv(t - t)J 

X (j*dt'f{t' -mT)w{t' - r)) 

= f U(t - ?iT)U(t - viT)dt 
Jo 

= l T U(t)U[t - (n - m)T]dt, (55) 
Jo 

f Note that we have neglected the dark current X . This obviously does not alter 
the final results. Also, the results can, in a straightforward manner, be extended to 
the multilevel case. 
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with 

U(t - nT) = f fit, - nT)w{r - h)dt h 

Jo 

and the observation time T is taken to be extremely large (r» T). 

The receiver structure indicated by (53) to (55) is similar to the 
maximum likelihood (ml) receiver for detecting a pam signal distorted 
by a noisy linear channel. 17 The received signal is first passed through 
the matched filter w{— t), and then (minus the bias term \) matched 
to f( — t). The result is sampled at the synchronous instants nT. This 
produces the set of sufficient statistics [z n } , from which the hypothesis- 
insensitive bias term § £*, m =i L a n a m 3Z n - m is subtracted to produce the 
likelihood function. 

The method by which the likelihood (53) is sequentially maximized 
in real time has become known as the Viterbi algorithm (va), as a 
result of its application to the analogous problem of ml detection of 
linearly distorted pam data signals. 

The va is a dynamic programming algorithm that uses the "finite 
memory" of 3C n , i.e., the fact that there will always be a k such that, 
for all practical purposes, 

0C„ = 0, \n\ > k. (56) 

Because of (56), it is easy to see that the likelihood, (53), can be written 
in the recursive form 



k 

L(a u o 8 , • • •, a k ) = L(a h a 2 , • • •, a*_i) + a k z k - \a k £ 3C*_ m . (57) 



k 

m=k— k 

By introducing the sequence of state vectors {<r„}, where 

a„ = (a,^Ck-D, •", a.>), n= 1, 2, • • •, k, (58) 

the likelihood can be written in the form 

L(a h ■■■,a k ) = L(a h ■■-, Vk _ x ) + h(z k ; <j k ). (59) 

As is well known, the maximization of the function L(a h ■ ■ ■ , a k ) with 
respect to its arguments is amenable to solution via dynamic program- 
ming since (59) is satisfied. Since this is the case, the optimum receiver 
now assumes the structure shown in Fig. 6. 

In summary, it has been shown that the ml receiver for the limiting 
case of small s/n has a structure that is asymptotically approximated 
by the receiver designed to detect a known signal in gaussian noise 
(with the inclusion of certain precomputed bias terms). We remark at 
this point that the application of the Viterbi algorithm is, of course, 
only productive when intersymbol interference is the dominant im- 
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f(-t) 



nT 



VA on 
K 

1 a k 2k - 
1 

K K 

1/2 2 Saka k 'K k _ k . 

1 1 



*t<-k' = fJflty-kT)1 (t 2 -k'T) R(t,-t 2 ) dt,dt 2 
o o 

Fig. 6 — Optimum detector (large noise) for pam signaling. 

pairment. In the context of the above discussion, this will be manifested 
in the values of 3C„ for n ^ 0. These values depend on the data rate 
relative to the channel dispersion. As in data transmission over voice- 
band channels, other methods of processing such as linear and decision 
feedback equalization should provide good results so long as the inter- 
symbol interference is not inordinately large. It is clear from (53) 
that when the distortion is small enough so that the quadratic term 
can be neglected, the optimization of the likelihood with respect to 
the data symbols can be carried out on a term-by-term or bit-by-bit 
basis. In other words, passing z„ through a sheer provides optimum 
detection. As the distortion becomes more severe, the quadratic term 
appearing in (53) must be retained. The linear receivers reported by 
Personick 5-7 and Messerschmitt 8 ' 9 can be obtained from (53) by 
differentiating this expression with respect to the data symbols and 
then quantizing the result to the legitimate transmitted data levels. 
As the distortion increases still further, it becomes necessary to maxi- 
mize (53), as it stands, via the Viterbi algorithm. Selecting one of these 
detectors in any given situation requires an evaluation of the error 
probability to quantify the effect of distortion on the system per- 
formance. 

VI. PERFORMANCE ANALYSIS OF THE OPTIMUM DETECTOR FOR BINARY 

ONE-SHOT SIGNALING 
6.1 An upper bound on the error rate (a simple example) 

Having a description of the optimum detector structure for a 2 — > 0, 
it is interesting to inquire how well it performs in certain signaling 
situations. Unfortunately, the M-ary mode of operation is extremely 
difficult to analyze, and even the general binary case poses insur- 
mountable mathematical difficulties. We have, however, been able to 
analyze several special cases of interest that provide insight as to the 
effect of various system parameters on performance. 

In the binary signaling case, information is conveyed by sending 
either intensity \i(t) or \ 2 (t) with equal probability. From (51), the 
ml detector has the realization shown in Fig. 7. The detector, in this 
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X,{T-t)-X 2 (T-tl 
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-0 J .^J 






X 2 















£. = i/2J_f (X 2 (t) -X, (t) dt + //R(t -f) (X, (t)X,(f) _X 2 (t)X 2 (f) )dt dt'( 



Fig. 7 — Optimum detector for binary signaling (a 2 -* 0). 



situation, computes the statistic 
■r 



m =/ o [Xi(0 - x 2 (0]P(0* -i / [Xi(0 - x 2 (i)]^ 

" \ / / fi(i " r )C Xl ^ Xl ( r ) - ^(OXi(r)](ttdr, (60) 

and n is then compared to zero. When y. > 0, it is decided that Xi(/) 
was sent, and when /i ^ 0, X 2 (0 is chosen. In (60), the indicated 
quantities are normalized such that 

P(t) - m L v{T)w(t - T)dr 

andfl = R/R(0). 

The probability of error is 



P e = § Pr Cm ^ 0|y(0 - /i(0 + n(t), ^ t ^ r] 
+ I Pr [ M < O|y(0 = J«(0 + n(0, ^ * £ r], 



(61) 



where 



hit) = £, w(t - t n ) with E[y~\ = f Xi(£)df, 
1 Jo 

and where 

hit) = f, w it - t n ) with ETvl = f 1 X 2 (£)d£. 

1 Jo 

It turns out that the evaluation of (61) is not mathematically tractable 
when Xi and X 2 are arbitrary positive time functions. Even reasonable 
bounds on (61) are difficult to calculate in general. However, for con- 
stant intensities, exponentially tight upper bounds can be obtained. 
While the restriction to constant intensities might appear severe, it 
is shown in the appendix that in the absence of both dark current 
and gaussian noise the optimum choice of signals will have one intensity 
equal to zero while the other is arbitrary and need only satisfy a power 
constraint. Here we wish to illustrate a bounding approach for one 
special case where the upper bound can be obtained in closed form. We 
analyze the error rate for a system slightly modified from that depicted 
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1/2 j Ay -A. 2 rj*R(t-t')dtdf( 




A, 


A. 




M 



Fig. 8 — Optimal detector for a 2 small and Xi = and Xj = X. 

in Fig. 8 for Xi = and X 2 = X. The modification will involve adjusting 
the threshold 1 so that our upper estimate of the probability of error 
when Xi is sent is equal to the estimate of probability of an error in 
the complementary situation. 

In the binary system under consideration, the information symbol 
1 is encoded into the intensity function \i(t) = X, ^ t ^ T and the 
information symbol into the intensity X 2 (0 = 0, ^ t g T. Notice 
that the dark current is assumed to be zero. The detector structure we 
wish to analyze is depicted in Fig. 9. Here, the information-bearing 
Poisson process is passed through a matched filter w( — t)/R(0), then 
integrated, and the result compared with a threshold set at F. If n 
(refer to the block diagram) exceeds F, the symbol 1 is chosen and if 
n g F, the symbol is chosen. Our chief interest in this example is to 
exhibit the interplay between the various parameters in this extremely 
simple but informative situation. 

As seen in the diagram, 

M = v f T R(t)dt + f f n( T )w(t - T)dtdr (62) 

Jo Jo Jo 

or, equivalently, the test statistic may be written as 

/xo = v + n , (63) 

which is compared to a threshold. Note that /x is just a scaled version 
of ix, and wo is a zero-mean gaussian random variable with 



E{nl) = No 



/ R(t - r)dtdi 

Jo Jo 

[/. 



r "|2 

R(t)dt 



ii o*. 



(64) 



Observe that, in this situation, the receiver is just a counter since the 
test statistic represents the total number of photon counts observed 
in the entire observation interval plus an added gaussian random 
variable. 



* By the threshold, we mean the bias terms appearing in (60), i.e., the last two 
terms. 
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Fig. 9 — Detector for a 2 small and Xi = 0, X2 = X with threshold modified so error 
probabilities are equal. 



The integer random variable v is Poisson-distributed with 
#[>] = XT when 1 is sent (Hi) 



and 



(65) 



E\_v~\ = when is sent (H ), 



where H and Hi are symbols distinguishing the two situations. The 
probability of error is then explicitly given by 



P e = i Pr [ M > F\Ho] + \ Pr [> ^ F|ffi], 



(66) 



where we have made the assumption that 0s and Is are transmitted 
with equal probability. 

Since (66) cannot be expressed in closed form, we seek an expo- 
nentially tight upper bound. Applying the Chernoff bounding tech- 
nique, we notice that the error rate under the null hypothesis, H , can 
be upper bounded immediately since under this hypothesis v = 0. 
Applying the bound yields 

P = Pr [m > F\Ho] < exp - £ ) • (67) 

The second term in (66) can likewise be upper bounded since the 
moment generating function of v under H x is known. This procedure 
gives 

r2 



Pi = Pr[ M g F\H{] ^ exp 
where 



SF + 6 ^ 



M y]Hl (-d), 6 ^ 0, (68) 



M wlHl (8) = E{e>°\Hi\ = exp [Xr(e e - 1)] 
for 9 ^ 0. The bound (68) then becomes 

Pi ^ exp W + ^ + \T(e- e - 1)1 , 0>O, (69) 

and it is optimized by finding a 0* such that 

( (T 2 fi 2 1 

fl(0* F) = mm\6F + ?£-+ \f(r 9 - 1) • (70) 
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To make the upper bounds on Pi and P 2 equal, we select an P = P 
such that 

This, then, yields the final upper bound on the error rate 

P e fS exp (-n/2o*). (71) 

By differentiating (70), we see that for a positive solution to exist it is 
required that < F < XT. Unfortunately, such a solution cannot be 
obtained in closed form. However, lower bounding 1 — e~ e by — 2 /2, 
which in turn upper bounds (69), we find that 

and consequently 

Pi ^ exp 



-d*(\r- F) +^-(xr + cr 2 ) 



(73) 



where 0* has been chosen to provide the tightest bound. 
Having 0*, the threshold F is obtained from 

Pj = (Xr-Po) 2 
c 2 XT+a 2 

Solving this quadratic equation and selecting the only reasonable root 
for P give 

Po = -o- 2 + V<r 4 + <r 2 XT. (74) 

Substituting (74) into (72), the bound on the error rate finally becomes 

P e < exp [- | {VTTC - VC} 2 ] , (75) 

u v ^~ An iiv Averag e Noise Power 

where K = XT and C = <r 2 /K = -r ° M . — p - 

Average Shot Noise Power 

It is instructive to express the bound (75) in the following alterna- 
tive form 

P e ^ r-tfW, (76) 

where 

1_K 2 _ 

p ~ 2K + ^ 

_ Average (signal) 2 

Average Total Noise Power 

and where f(c) = [1 + c — -4<f+c] 2 . 

As can be checked, f(c) is a monotonically decreasing function of c, 
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and has the properties 

lim /(c) = 1 

c-»0 

lim /(c) = I 

c -»ce 

Thus, P e S e- p/(c) -* e- p , as c -> 0. 

This is the situation that prevails when the shot noise dominates. 
On the other hand, 

Pe ^ e-" 1 *, as c -» oo , 
which is the situation when the gaussian noise dominates. 

6.2 Implications of the error bound 

The first observation concerning (75) is that, as C — > 0, P e 
^ exp { — K/2) . This can be achieved by making o- 2 — > 0. This implies 
that either the gaussian noise is zero or that the number of counts is 
very large. However, in the absence of gaussian noise (as well as dark 
current), it is clear that the only way to make an error is when there 
are not any counts (y = 0) under H\. The chance that v ™ under 
H\ is just exp ( — K\. In the absence of gaussian noise, this is clearly 
the very best performance one can hope for. Notice that the upper 
bound predicts an outcome which is 3-dB poorer than this ideal. The 
factor of 2 in the exponent of (75) is attributed to our bounding tech- 
nique. What, in fact, happens as <r 2 — > is that 6* increases, and that 
the lower bound 8 — 2 /2 becomes loose, the upshot being the factor 
of 2 in the exponent. To see that this factor of 2 is indeed a quirk of the 
parabolic approximation to the exponential, consider the exponent in 
(69) as a 2 — > 0. It is clear that the optimum threshold and 6 are, re- 
spectively, zero and infinity, which when substituted in (69) does 
indeed give fl-W (K = XT). 

Another aspect of the bound, however, is that ideal performance 
can be achieved with this detector structure (which is optimum for 
a 2 — > oc , the large gaussian noise situation) when the noise vanishes 
(a 2 —* 0) . This suggests that for the case of constant intensities the 
linear threshold detector is robust, i.e., it performs well over the entire 
range of a 2 (or a 2 ). 

We now use the error bound to determine the number of counts 
required, for reasonable operating physical parameters, to achieve a 
desired error rate. Note that, from (64), after a simplifying calcula- 
tion on the double integral, we obtain 

( ffi(t)dt 

rzil <r — •' 
A 



= ^2(rA-J o T tR(t)dt} = 



r- 



f R(t)dt 
Jo 



(77) 
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where 

A = / R(t)dt. 

Jo 

Introducing the pulse stretch factor, 

S = f T tR(t)dt / J R(t)dt < T, (78) 

into (78) and recalling that a 2 = R(0)/N yields explicitly 

_ 2 l^r.JMg (79 ) 

a 2 r A 

where < r = S/T < 1. What, then, can be said about the choice 
of the parameter r? Can it be selected at will? Within a good approxi- 
mation, SR(0)/A ~ 1. Clearly, the best choice of r appears to be 
unity, since r = 1 reduces the noise variance to zero. Recall, however, 
that, when the mathematical model was initially introduced, it was 
tacitly assumed that the observation interval was much larger than 
the width of the pulses emanating from the photodetector so that edge 
effects could be neglected. This alone would restrict the range of r to 
be no more than, say, 0.1, which would indicate that r does not appear 
to be an independent parameter. With r = 0.1, we may conclude from 
(79) that the effective gaussian variance of the scaled system is roughly 

a 2 = 20/a 2 . (80) 

Returning now to (75), we see that ideal performance is achieved 
when 

C = ~ « 1, 

and when (SO) is substituted in the above, we arrive at the condition 
that 

^ « 1 -* Ka 2 » 20. (81) 

As an example, let a 2 = 1/400, which, according to S. Personick, f is 
a reasonable number for this parameter. This implies that K » 8000 
is required to achieve ideal performance (i.e., the error rate in this 
range approaches zero like e~ K ). On the other hand, suppose it is 
desired that P e ^ 10~ 9 . This would imply that 

K 



2 {^ (20 /Ka 2 ) + 1 - V20/.Ka 2 } 2 ~20. 



' Private communication. 
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For instance, a 2 ~ 1/400 implies that K is on the order of 1200. The 
above discussion quantifies the facts that to achieve good performance 
the total number of counts must be large or, if the background gaussian 
noise is small then fewer counts are needed to provide satisfactory 
performance. 

6.3 Some conclusions concerning optimum detection for constant intensities 

Note that the linear receiver, which is optimum when a 2 — > 0, seems 
to be robust — at least for binary systems signaling with constant 
intensities. The optimum detector in the small s/n case (a 2 -* 0) yields 
a decision variable based on the total number of observed counts as 
evidenced from (63). Of course, for the error probability bound to be 
tight, the average number of counts, K, must be large enough that 
«r 2 //C«l. On the other hand, we saw that the optimum detector 
structure in the case of large s/n (a 2 — > oo ) combined with narrow pulses* 
(r « 1) is also a counter. The only difference is that the counts in the 
a 2 — >0 detector are linearly corrupted by gaussian noise, while the 
counts in the a 2 — > oo detector are determined by quantizing the in- 
coming signal to the nearest integer in the presence of the added 
gaussian noise. The latter operation is, of course, nonlinear. Never- 
theless, when the added noise is small (a 2 — ><»), the two operations 
are approximately equivalent, thus explaining the robustness of the 
linear receiver and the results of our theory. 

VII. LARGE SIGNAL-TO-NOISE RATIO (a 2 -» °o) AND NARROW PULSES 

When a photomultiplier or avalanche photodiode is used to provide 
direct detection, the parameter a 2 is much larger than unity. In this 
application, the response of the photodetector to a single electron or 
hole is much larger than the background gaussian noise. In this situa- 
tion, intuition dictates that the detector make use of the "estimated" 
arrival times of the individual photons. Here we discuss a special case 
that will bring out the essential structure of the optimum detector. 
The situation examined is when there is no avalanche gain and the 
individual pulses w(t) are time-limited to an interval much smaller 
than the observation interval. The more general situation is treated 
in Section VIII. 

The approach taken in this section is to use the gaussian process 
formulation (45) and attempt to approximate the indicated expecta- 
tion with respect to the z (0 process. For this approach to be productive, 
we must assume that B(t) has effective duration A. We may then 

' This was demonstrated in the examples of Section III and is reestablished in 
Section VII. 
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approximate the integral appearing in (45) by a discrete sum, i.e., 

f r dtexp [aP(t) + toi(0>m(0 ->Alexp (a*P,- + iaXj)\ m (jA), (82) 

where P,- = P(j'A) and &,• = x(jA). 

The implication of (82) can be viewed in several ways. Of course, 
as A— >0 and «/—><», irrespective of the correlation function R(t), 
the discrete sum is an excellent approximation to the integral. But 
sampling the integrand at the rate 1/A does not necessarily guarantee 
that the sum is a good approximation to the integral. Yet to derive 
any utility from representation (45), we must sample at a rate 1/A 
so that the sequence of random variables [xj] can be regarded as 
identically and independently distributed. Unfortunately, this is the 
only case for which we can compute the indicated averages in a useful 
form. What then do we mean by (82)? To make sense of this repre- 
sentation, we must reinterpret the distribution of the arrival times, 
{ t n } . Evidently, the reason we have an integral representation instead 
of a sum is because we have assumed that the arrival times obey a 
continuous distribution. However, if we assume at the outset that 
the arrival times { t » } can occur only at a set of discrete points { t „ = n A } , 
then (45) will contain a sum instead of an integral. This procedure is 
equivalent to that used to obtain (41) as the discrete version of (40). 
Hence, a rigorous interpretation of (45) is that the Poisson arrival 
times can only occur at the discrete instants of time { jA), j = 0, 1, 2, 
• • • . If we now assume that the quantization of the arrival times to 
units of A is such that R(A) ~ 0, then the set of random variables 
{ajyjf-i are mutually independent. Exploring this line of reasoning, 
(45) can be written as 

e A L(y) = ft #*[exp {AX,- exp (a 2 Pj + iaxj)}], (83) 

where X> = X(jA), and we have suppressed the index m denoting the 
particular hypothesis being tested. 

Expanding (83) in a power series and carrying out the indicated 
expectation give 

•™-&U^-K rf '-i)D- (84) 

We are now in a position to exploit the assumed large value of a 2 . 
In other words, we are interested in determining the behavior of (84) 
as a 2 — > * . Towards this goal, consider the sum 

gj = f ^MT exp WinPi - n72]}. (85) 

n=0 Wl 
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This series is in the form 

£ n exp (oV), (86) 



where /3„ and y„ have the obvious identifications. 

Let f y be the largest of the 7„ and /3 y be the corresponding value of 
j8„. Then (86) becomes 

iffy exp (a"?,) ( £ x exp [rf ( 7 . - fy)]) , (87) 

where each y n — y, is negative. Since (86) converges absolutely, the 
infinite sum can be rearranged in such a way that the exponents are 
decreasing; thus, the rearranged sum is recognized to be a Dirichlet 
series 18 in the parameter a 2 . From the elementary properties of such 
series, we deduce that, except for the n = j term, the summation por- 
tion of (87) converges to zero as a 2 — *oo . So, as a 2 — >a> , (87) behaves 
like 

Sy~/3yexp(a 2 7y). (88) 

Now returning to the series in (85), we let n, denote the strictly 
nonnegative integer attaining 



max a 2 nPy-^a a 

nGlO.1.2.---] 2. 



(89) 



i.e., Uj = [P], where [P] denotes the nonnegative integer nearest to P. 
The corresponding coefficient becomes 

A n '(\i) n > 

0; = ^7! ' (90) 

Thus, as a 2 — > co , 

L(y) = log L(y) = -A + log 

X { £ ( ^f^ exp laHnjPj - n}/2)]) } ■ (91) 

Discarding the hypothesis-insensitive terms, (91) can be rewritten 
in the form 

L m (s/)~-A m + £ n/logAj* (92) 

where we recall that ny = whenever P> < \. Note that (92) is 
similar to the detector described by (56) ; however, the different 
statistical model (Bernoulli as opposed to Poisson occurrences) 
accounts for the bias term — A m appearing in (92). 

The detector structure exhibited in (92) has a simple interpretation 
and is similar to that depicted in Fig. 3. As shown in Fig. 10, the in- 
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MAX 



Fig. 10 — Weighted counter with quantizer. 

coming signal y{t), having been filtered by w( — t)/R(0), is sampled 
every A seconds. This is followed by quantizing the samples, Pj, to the 
nearest positive integer (including zero whenever Pj < 5). The 
quantized samples are multiplied by the locally stored numbers log X ( /° 
and the results summed. The sum is added to A m to form the decision 
statistic. Since the added gaussian noise is assumed to be small and the 
pulse w(t) is assumed to be narrow, most of the time the nearest integer 
at any time tj = jA will be either 1 or 0, depending on whether 
Pj > \ or Pj % 5, i.e., whether the receiver determines a pulse is 
present or absent. Consequently, the optimum detector structure may 
be viewed as a weighted counter, where the decision as to which in- 
tensity was transmitted is based on selecting the largest of the weighted 
pulse counts. 

We recognize that from an implementation point of view even this 
seemingly simple structure may pose practical difficulties. The indi- 
cated sampling may be difficult to carry out at this high frequency. 
While this is indicated mathematically, in practice the peaks of the 
signal at the photodetector output could be used to approximate the 
photon arrival times and, hence, the interrogation times. 



LIKELIHOOD RECEIVER FOR LARGE SIGNAL-TO-NOISE 

-►CO) 



VIII. MAXIMUM 
RATIO (« a 

This section extends the results of the last section by indicating a 
general approach to the extremely complex problem of performing 
optimum detection when the pulses w(t) are not restricted in width or 
shape and when avalanche gain is provided. In the presence of ava- 
lanche gain, the average signal-to-noise ratio, a 2 , is large. This implies 
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that the photon arrival times can be accurately estimated, and these 
estimates can then be used to aid the detector in making accurate 
decisions. One objective of this section is to indicate how the optimum 
detector estimates the arrival times. Heuristically, the receiver at- 
tempts to "whiten" or peak up the pulse w(t). The presence of gaussian 
noise, however small, prevents pulse whitening via linear filtering. 
The nonlinear manner in which the receiver estimates the arrival 
times is of independent interest and will be presented in the sequel. 

We begin with the most general form of the likelihood function (40). 
While the infinite functional series appearing in (40) is quite intimidat- 
ing, it has already been shown to reduce to physically interpretable 
receivers in the following special cases: (i) small signal-to-noise ratio 
(a 2 — > 0) and (it) large signal-to-noise ratio (a 2 — ►«>) combined with 
an extremely small decorrelation time t for R(t). 

Since large a 2 is a practical operating condition (photomultiplier 
and the avalanche photodiode), we are motivated to examine the 
salient features of the optimal processor under these circumstances. 
We also specialize our development to the PAM-Poisson intensity, or 
data signal, 

Xm(0 = Xo + E oT/(« - nT), < t g T, 

where f(t) is a known pulse shape determined by the distortion (inter- 
symbol interference) in the optical fiber and X is again the ambient or 
"dark" current. Here, the optimum receiver maximizes the likelihood 
function with respect to the data sequence {a^ m) }^= . As it stands, the 
likelihood (40) is similar 1 in form to the Volterra kernel description of 
a general time-varying nonlinear functional. However, such generality 
seems to preclude any practical value, and furthermore reveals little 
of the receiver's essence. To obtain a good approximation to the struc- 
ture of the receiver when a 2 — > * , it will again be necessary to dis- 
cretize the photon arrival times. 

8.1 The asymptotic {a —> °o) likelihood function 

In this section, the basic idea is to asymptotically evaluate the 
multidimensional sums or integrals. Note that, when a 2 — ><», the 
2n-fold integrals appearing in the likelihood become increasingly sensi- 
tive to the value of the exponent, and in the limit the integral is com- 



' Note that, as R(l) — > S(t), the gaussian noise becomes transparent to the receiver 
(since the integrated received signal would be discontinuous whenever an impulse 
arrived). The receiver then assumes the form of a counter. 

1 The difference is that, in our application, the input P(t) is exponentiated rather 
than appearing directly. 
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pletely determined by the coordinates that maximize the exponent. 
This statement is made precise by the multidimensional version of 
Laplace's theorem 19 which, apart from certain hypothesis-insensitive 
terms, gives for each n 



lim f dt„L n p(g<) n utj) 

a ->«> Jo g„ > = i y=i 

X exp L 2 [ t g m P{t m ) -\ t g m g k R(t m - **)1 } ~ n pfo5)x(© 

I |_m«l Z m,A = l . J J-l 

X exp L 2 [ £ £P(6) - i LE 0b9*<6 " 6)1} , (93) + 
where {£, £, *•■,£) and {0?, ^, • • •, fl£} maximize the exponent, 

t g m P(tm) - Jllw^fc - **), 

m = ] ^ m,A=l 

under the constraint that ^ U < T, i = I, 2, ■ ■ • ,n. The determina- 
tion of the extremizing sets appears very difficult. For example, without 
avalanche gain (i.e., g m = 1) and n = 1, it is clear that & is taken at 
the point where the observable P(t) is a maximum. For example, 
when n = 2 the exponent becomes 

P(*l) + P{h) - fl(<l - fe), 

and the choice of h and t 2 is not apparent. The values of tf and t* 2 tend 
to be near the peaks of P(t), but this is not always the case.* The best 
choice of h and t 2 will, of course, depend on the interaction of the 
random process P(t) and the correlation function R(t). The problem 
of finding the set of points {t*} is in some sense equivalent to whitening 
or peaking up the pulse w(t) in a nonlinear manner to minimize the 
noise enhancement concomitant with such an operation. Putting 
aside for the moment the difficulty of determining {t*, •■•,t* n ) and 
[g*, • ■ ■, gl}} we can use these values to rewrite the right-hand side of 
(93) as 

[ IT pfo5>A(©l exp [« 2 1 t g* m P(Q - I zZ n Z glglR(C - Q 1 

A7 n (r)exp[a 2 /3„(r)], (94) 

* It has been assumed that there is only one set of variables t = (ii, • • ■, t n ) and 
g = (ffii " "i 9*), which maximize the exponent. If there are several such t* and g , 
then the right-hand side of (93) would consist of a sum of these terms. We do not 
pursue this approach, since the resulting structure is hopelessly complicated and 
appears to be impractical. 

* This would be the case whenever P(t) has equi valued maxima spaced at least a 
decorrelation time apart. 
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where we have indicated the dependence of both the coefficient and 
the exponent on the observation interval T. Using (94) in (46) gives 
the Dirichlet series 21 

e A L ~ f: y n (T)-exp [a 2 j8„(r)]. (95) 

n-0 

As a — » t» , it is well known that the Dirichlet series is dominated by the 
term with the largest exponent, i.e., 

lim e A L ~ 7 -(T) exp {a 2 p n *(T)}, (96) 

a-* co 

where /3„* is the largest exponent.* It is evident that n* is an estimate 
of the number of (Poisson) events occurring in the interval T and 
that t* u ■ ■ • , t n are estimates of these occurrence times, while g*, • • • , gl 
are estimates of the avalanche gains. This is not surprising since, as 
a 2 — » co , the vanishingly small noise implies that these estimates will 
be quite accurate. Hence, the receiver is intimately related to the 
situation considered by Bar-David, 3 where the Poisson events can be 
observed directly. The distinction is that estimated arrival times and 
avalanche gains are used rather than their true values. It is important 
to realize that specific estimators have been obtained for the random 
parameters. As we show in the sequel, the simultaneous estimation and 
detection described above can be recursively implemented via dy- 
namic programming. 

Since neither the exponent in (94) nor the TLT=ip(q*) term is hy- 
pothesis-sensitive, the relevant portion of the likelihood function is 

L~e-*y n .(N) -r*fix({), (97) 

3-1 

where n* is the number of time points that maximize the exponent of 
(92) (which, of course, depends on T) and {£}?-! are the values of 
these time points. Note that, once the exponent is jointly optimized 
with respect to t„ and g„, the estimate of the avalanche gain is not 
utilized further. This is so because the avalanche gain is a property 
of the photodetector and conveys no information concerning the 
intensity function. The asymptotic (a— >°o) likelihood given by (97) 
is exactly Bar-David's 3 likelihood formula, with the true arrival 



' If the signal-to-noise ratio is not large enough so that this is not an accurate 
approximation, then one could designate n+ as the second largest exponent, thereby 
developing the more accurate series 

*~^<-«*-*«.->(i+S3fel)- 
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n 


^ m,fc=l 




max 


E <?». *>(*») - 


-<*) 


g,i,t*, and n 


m = l 




OgfcgT", t = l,2,---,n 









times replaced by estimated arrival times. Note that the log-likelihood 
L m = -A m (?0 + £logX(->(O (98) 

is again a weighted counter, and is similar to (98) derived in Section 
VII [where the pulses w{t) were assumed to be narrow]. 

Two shortcomings are associated with the above approach, one is 
computational and the other involves a question of mathematical 
rigor. The first point is that implicit in the expression for the likeli- 
hood (97) is the ability to solve the formidable mathematical problem, 

(99) 

in real time. We are not aware of optimization techniques capable of 
this accomplishment. The second point involves the invocation of the 
large a 2 assumption in a sequence of operations. Recall that this assump- 
tion was used to derive (93) and then used again to obtain (96). 
While the validity of the preceding operations can perhaps be demon- 
strated (under suitable conditions), the intractable nature of (99) 
forces us to slightly reformulate our problem. 

8.2 The optimum detector when the photon arrival times are discrete 

To proceed further and obtain a physically realizable, as well as 
meaningful, detector, we discretize the photon arrival times. Adopting 
this approach, the photon arrival times are now constrained to occur 
at the discrete instants jA, (j - 0, 1, 2, ■ • •, J, where J = T/A). 
This gives rise to the discrete likelihood function (41), and eq. (98) 
then involves only sums rather than integrals. This modified expres- 
sion contains a 2n + 1 dimensional sum, which is recognized as a 
bona fide Dirichlet series. Thus, we have avoided the mathematical 
question concerning the validity of an asymptotic expansion by intro- 
ducing a mild relaxation of the physical set-up. 

Applying the asymptotic condition to the 2n + 1 variable summa- 
tion again produces the expressions (94) to (99) where it is recog- 
nized that the variables {U} are now constrained to lie on the 
lattice, i.e., U = j,A, where j { = 1, 2, • • •, J. We now show that, using 
this discrete framework, the exponent appearing in (94) can be re- 
written in a form readily amenable to maximization. Note that the 
variables t*, & ■ • • , £ may be thought of either as specifying a single 
point in n-dimensional space or as specifying n points on the interval 
(0, T). This latter viewpoint turns out to be more useful. 
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We choose the time quantization A so that the probability of more 
than one photon arrival occurring in a time interval of size A is vanish- 
ingly small 1, under each hypothesis \ m (t). In this framework, the set 
of time points \t* k } specifies n points in the interval (0, T), and the 
exponent can be rewritten as 

t g m P(C) - 5 ££ 9mg k R(t m - h) 

m-l & m,k=-l 

J 1 J 

= E g m q m P(niA) --H g m g k q m q k R(mA - kA), (100) 

m-l £ m,k=\ 

where ^ C = J& and where q m is or 1. A value of q m = 1 implies 
that the time point mA is "active" in the sums appearing in (100), 
while q m = implies that it is not. If one chooses A to provide a 
coarser quantization of the time axis, as might be required by practical 
restrictions on the sampling rate, then it is necessary to allow q m to 
assume more (integer) values than and 1. To see why this must be 
the case, recall the physical meaning 1 of the time points {t*}. It is then 
realistic to expect that more than one photon will have arrived in a A 
interval and consequently some t* = t) (for i ?± j) . The increased range 
of q m is necessary to accommodate this situation. Realizing that no 
restriction is implied, for reasons of simplicity we assume in the sequel 
that A is chosen small enough so that q m = or 1. At this point, it is 
clear that the product g m q m is inseparable in the optimization of (100). 
Note that, once the optimum values of q m and g m are determined, only 
the value of q m plays a further role in the detection procedure. With this 
in mind, we let /3 m = q m g m , where /3 m will range over the allowable 
values of g m as well as zero. For convenience, we call this discrete set 
B. In the context of this new notation, the optimization problem posed 
in (99) becomes 

max L m P(mA) - \ £ L p m p k R(mA - kA), (101) 

Pl,- -.Pj m-l -S m.k=l 

PEB 

where it is important to realize that the maximization with respect to 
n, appearing in (99), has been removed in (101) by eliminating the 
restriction that only a predetermined number of q n 's be nonzero. It is 
also apparent that the exponent is of the required recursive form so 
that the exponent can be maximized via the Viterbi algorithm. With 



' This probability is 1 — e x — Xc x « X 2 . 

1 The \t*\ are estimates of the pulse arrival times. 
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this in mind, the likelihood function can now be written as 



j 



<r A II CMiA)]-/, (102) 



y=i 



and the log-likelihood again assumes the weighted-counter form 

I = - f r \(t)dt + £ qjlog CX(jA)], (103) 

which is similar to the detector described by (92) but without the 
restriction on the correlation function R(t), i.e., R(t) need not be con- 
fined to an interval A. The result embodied in (92) for nonoverlapping 
pulses can be easily derived from (101) by setting R(mA - fcA) 
= S k - m . The exponent then becomes T.i=i[PkP(kA) - JjSf], which is 
optimized, over the integer values of fit, by choosing p k to be the 
quantized version of P(kA). 

The structure of the optimum detector (103) is shown in Fig. 11, 
and is of the estimator-detector type. The arrival time indicators 
{q }{-i (as well as the avalanche gains) are determined by applying 
the Viterbi algorithm to the exponent. Once these values are available, 
the likelihood is computed for each hypothesis X (m) (0 and the maximum 
is selected. 

8.3 Optimum detection of PAM intensities 

The above methodology is now applied to the optimum detection of 
a digital (pam) data signal. The 2 N+l intensity functions in this situa- 
tion are given by 

\(t) = Xo + £ a n f(t - nT), O^gf, 

where the effect of optical channel distortion (intersymbol inter- 
ference) is included in f(t). 

To optimally detect these signals, it is convenient to rewrite the 
original likelihood expression so that time is directly expressed in 
units of A. Bringing out this dependence, the likelihood function then 
becomes 

Lj - exp I - 1^ \(t)dt\ jS [XOA)>, (104) 

where the index J designates time in units of A. Note that the expo- 
nent (101) is already expressed in this form. 

It is important to emphasize that a simultaneous or two-tier real- 
time sequential optimization procedure is required to extract the ml 
estimate of the data sequence, [a n }n=o- The exponent is first maximized 
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Fig. 11 — Estimator detector type of weighted counter. 

with respect to the {/3.}y= , and the corresponding q x values are then 
used to maximize (104) with respect to the data symbols. The optimiza- 
tion of the exponent is identical to that occurring in ml data sequence 
estimation in the presence of intersymbol interference. 19 The maximiza- 
tion of the exponent will, at random intervals, 1 produce optimum 
values of {<&•}, say, # , <?i> " ' •» <?*• At this instant, the optimization of 
the likelihood L k can then proceed using this new information. At 
some later instant, #*+i, #*+2, •••, #*+« will become available and 
attention again shifts to maximizing the likelihood !/*+„. As we shall 
show, the dynamic programming algorithm which maximizes the 
coefficient (103) is quite different from the conventional Viterbi 
algorithm. In fact, the application of dynamic programming to the 
iterative* maximization of this function illustrates the more general 
principle that dynamic programming is applicable to the iterative 
real-time ml sequence estimation of digital data that has undergone 
a wide variety of nonlinear distortion. The only requirements are 
that (i) the likelihood possesses the mathematical property of addi- 
tivity and (it) the nonlinearity is of finite memory so that the notion 
of a "state" is meaningful. In this application, both these requirements 
are satisfied. 

To apply dynamic programming to the optimization problem ex- 
hibited in (103), we need only show that the likelihood satisfy a par- 



* Owing to the merge aspect of the Viterbi algorithm. 

1 The two main virtues of dynamic programming are that (i) it is essentially a 
real-time processing scheme (although there is random signal-processing delay) and 
(it) the number of computations is linearly proportional to time (n), as opposed to 
a straightforward evaluation that requires an exponentially growing number of 
computations. 
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ticular recursive form. To put the likelihood in this recursive form, we 
define the state vector 

S y = (ay-i-,...., a,) j - /, J + 1, ■ ■ •, J, (105) 

where / is the memory (in units of A) of the dispersed pulse f(t), i.e., 

f(nA) = 0, n> /, (106) 

and where i\ is the closest integer to fA/T. 

As the optimum \fa) time instants emerge from the Viterbi al- 
gorithm in a random manner (owing to the merge mechanism), they 
are classified according to which time segment (0, NT) they belong. 
Once optimum time instants begin appearing that are active in the 
(N + 1)T time segment, those optimum q n 's which are in the NT time 
segment are available to maximize the coefficient or, equivalently, the 
likelihood. 

By substituting the pam signal into (104), the log-likelihood has 
the form 

Lj= - E a n F n + E gylog (x„ + E a m f(jA - mT)) , (107) 

n = ;=0 \ 7n = / 

where J is now interpreted as the index of the latest 1 merge in the 
Viterbi algorithm associated with the time interval (0, NT) and 

f(t - nT)dt. (108) 

D 

It is important to keep in mind the fact that, once the decisions 
($ii #2, • • • , 4j) are available, the iterative procedure for maximizing 
the likelihood proceeds in units of T. The log-likelihood can be put in 
the required form by letting D = T/A and writing the likelihood as 

N-l ND-D / N-l \ 

L N = - E anFn+ E 9/log(Xo+ E _ OnSiJA-mDA)) 

n=0 i = \ m=N-j-J/D I 

ND / N \ 

+ a N F N + E qj log ( Xo + E . a m f(jA - mDA) ) ■ 

j-ND-D + l \ m-N-j-f/D / 

(109) 

It is crucial to realize that the last term in (109) only involves 0/_i_„ 
ty_i -,+!,... ,ay; therefore, with the state vector defined by (105), 
(109) can be written as 

L N = L N -x + h(<i N ;S N ), (110) 

where 

({n = (gND-D+1," •£&*>)• (111) 

+ In other words, the next segment of optimum g„'s will penetrate beyond the time 
instant NT. 
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Fig. 12 — Two-tier dynamic programming algorithm. 

It is well known that, through the use of the recursion (110), dynamic 
programming may be applied to the maximization of L N . 

The resulting receiver is depicted in Fig. 12, and is a two-tier dy- 
namic programming algorithm that simultaneously iterates the ex- 
ponent and the coefficient to obtain a sequential (or real-time) maxi- 
mum likelihood sequence estimate of the transmitted sequence {a„}. 
While the above detector requires sampling at a rate that could pre- 
clude practical implementation, we remark that, in the large a 2 en- 
vironment, a peak detector could be used to estimate the photon 
arrival times. These estimated arrival times would then be used in 
a dynamic programming algorithm to mitigate the effect of inter- 
symbol interference. 

IX. DISCUSSION 

The communication-theoretic model for the fiber-optic communica- 
tion system has proven to be quite useful. Using this model, the opti- 
mum (maximum-likelihood) receiver was exhibited under a wide va- 
riety of physical circumstances for M-ary and digital pam signaling. 
Whether or not the energy in the response of the photodetector to an 
individual photon is large or small compared to the background 
gaussian noise, the detector structure turned out to be a weighted 
counter. The details of how the weighting is carried out have been 
shown to be complex in some cases. Further investigation into system 
performance is needed before assessing whether or not such complexity 
is warranted in any particular application. For values of pulse energy- 
to-noise ratio (a 2 ) much less than unity, the structure of the optimum 
detector can be simply instrumented in terms of analog operations on 
the photodetector output. On the other hand, when a 2 » 1, and with 
or without avalanche gain, we have been unable to realize the optimum 
detector without first sampling the photo-detector output many times 
per symbol interval. This procedure may impose practical limitations 
on the implementation. Since the digital operations are required solely 
to estimate the photon arrival times, it has been pointed out that 
certain suboptimum operations (such as peak detection) may be used 
to estimate these instants. The power of maximum likelihood process- 
ing can still be used to mitigate the effect of intersymbol interference. 

From a communications and information theoretic point of view, 
there remain many important and, as yet untouched, problems asso- 
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ciated with the fiber-optic channel. Sharp bounds on the performance 
of the various detectors are extremely difficult to obtain, and very 
little can be said at this time. Also, questions concerned with capacity, 
reliability, and complexity need be addressed. 
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APPENDIX 

Optimum Binary Intensities In the Absence of Gaussian Noise 

In this appendix, we determine the optimum binary intensities 
\i(t) and X 2 (0 in the absence of gaussian noise. We proceed initially 
by neglecting the dark current. Of course, the optimum intensities 
must satisfy an energy constraint 

r [Xi(0 +Xi(0J»- P- (112) 



/ 



Consider the performance of a system that uses the equiprobable 
intensities 

*i(0 = I 

og^r. (ii3) 

x 2 (0 = o 

The only way an error can be made under (113) is when Xi(0 is trans- 
mitted and no photons arrive ; the probability of this event is 

Pj = \e~ p . (114) 

Consider now the performance of a system that uses the arbitrary 
and equiprobable intensities \i{t) and X a (0- Tne probability of error 
for this system is 

Pn = *Pi + *Pi, (115) 

where Pi and P 2 denote the conditional error probabilities given that 
\i(t) and X 2 (0 are active. Let 

A.- = C XiWt, i= 1,2, (116) 

Jo 



f Since the intensity is proportional to the transmitted optical energy, the con- 
straint is on the average energy. 
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and let Ai be greater than A 2 . It is clear that, when A! is transmitted, 
the optimum detector must make an error when there are no photon 
arrivals. These observations provide the following sequence of lower 
bounds 

Pn ^ hPi ^ $<r\ (117) 

and since Ai + A 2 = P we have 

Pn ^ i e -A. ^ \ e - p = P r . (118) 

It is thus established that the intensities described by (113) minimize 
the probability of error and therefore are optimum. It is also clear that 
any system that has one of the intensities equal to zero, and the other 
arbitrary (and satisfying the power constraint), will perform equally 
as well as (113). 

The effect of dark current on the probability of error can be made 
arbitrarily small by choosing \ 2 (t) = and picking Xi(0 so that the set 
of points where \i(t) is nonzero is sufficiently small. 
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